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Abstract 

The formalism of Wiener filtering is applied to reconstruct, in terms of spherical harmon- 
ics, the projected 4ir galaxy distribution of a mock IRAS 1.2 Jy catalog with a 'Zone 
of Avoidance' \b\ = 15°. The Singular Value Decomposition algorithm is utilized as an 
extra-regularizer in order to stabilize the inversion. This case sheds light on the amount 
of useful information contained in the data, and also suggests a method for obtaining an 
'optimal' functional basis, based on our prior knowledge, for the purposes of likelihood 
analysis and Wiener reconstruction of cosmological data sets. The applicability of such a 
functional basis for redshift galaxy catalogs is also discussed 

Introduction 

Galaxy catalogs provide the main data base for the purpose of mapping the (continuous) 
cosmological dynamical fields (density, velocity and gravitational potential), which in turn offer 
a probe of the early universe and the nature of the primordial perturbation field. However, 
continuous density maps, constructed from the discrete galaxy distribution, suffer from Poisson 
'shot-noise'. This problem become much more severe in the case of magnitude limited samples, 
where the selection function magnifies the shot-noise contribution as the probability of observing 
galaxies decreases. 

Two other problems appear in analyzing the distribution of galaxies. First, the incomplete 
sky coverage, due to the obscuration by gas and dust in our local galaxy, the so-called 'Zone 
of Avoidance' (ZOA), poses an obstacle in mapping the whole-sky distribution. Second, the 
peculiar velocities in redshift surveys give a distorted picture of the galaxy distribution, even 
in the linear regime of gravitational instability. 

Recovering the underlying field from the measured field can be viewed as an inversion problem 



in the following sense. Consider the case in which the observed data points, d = {di}(i = 
1, . . . , M) are given by a linear convolution or mapping of the underlying field, s = {s„}(q = 
1, . . . , iV), namely, 

d = Rs + e, (1) 

where R is an M X N matrix which represents a response function and e = {e,} (i = 1, . . . , M), 
is the statistical uncertainty vector associated with the data. Here the standard notion of the 
response function is extended to include any theoretical relationship between two fields (e.g., 
the matrix which relates the redshift space density to the real space density [4] and [19]). 
Mathematically, the act of reconstruction amounts to solving equation 1 for s given d where 
R is assumed to be known. However, direct inversion of equation 1, i.e., s = R _1 d, suffers 
from several problems, the most important of which are: First, the number of independent data 
points is often smaller than the number of degrees of freedom (d.o.f.) of the underlying field, 
and therefore the data do not contain enough information to constrain all of the d.o.f.. Second, 
the inversion might be unstable due to the existence of statistical noise. 

As a result of these potential pitfalls of direct inversion, the technique of Wiener filtering (c./., 
[14], [20]) as a regularization method is presented and applied for the case of the projected 
mock IRAS 1.2 Jy catalog with |b| = 15° ZOA. 

The Singular Value Decomposition algorithm (SVD) [13] as an extra-regularizer is also applied. 
An extension of the SVD algorithm is presented and discussed in the context of Wiener recon- 
struction and maximum likelihood analysis of 3-D redshift catalogs ([20]). 

2 Wiener Filter 

The derivation of the Wiener filter (WF) requires prior knowledge of the the first two moments 
of the underlying field, namely, <^s^> (taken in what follows to be for simplicity), and its 

covariance matrix, S = (ss f ) = {(^s*)}. (see [9] [4] and [3] for various applications of WF). 
Define an estimator of the underlying field, s MV (MV stands for minimal variance), as the linear 
combination of the data, s MV = Fd, where F is an N X M matrix,, chosen to minimize the 
variance of the residual r defined by 

(rrt) = ((s-s MV )(st-s Mvt )). (2) 
Carrying out the minimization of equation 2 with respect to F one finds the so-called WF, 

F = (sd^dd 1 )" 1 = SR f (RSR f + N)-\ (3) 

where N = is the noise correlation matrix, which is assumed to be uncorrelated with 

the underlying field. WF includes two operations: inversion of the response function and sup- 
pression of the shot noise roughly by the ratio of — . prt ° r . — . Note that this ratio is less than 

1 o j j prior + noise 

unity, and therefore the method can not be used iteratively as successive applications of the 
WF would drive the recovered field to zero (see [20] for more details). 

While earlier applications of the WF have focused on estimation, namely suppressing the noise 
in the measured quantities, one can extend the application of the WF to predict the values of 
unmeasured quantities, such as the density field in un-sampled regions of space, or to decon- 
volve blurred data. Within the context of linear gravitational instability theory, the WF can 
also be used to perform dynamical reconstruction of one field which is dynamically related to 
some other observed field. This is the case, for example, of the reconstruction of the real space 
galaxy distribution from its redshift distribution. 



2.1 Conditional Probability 



The standard model of cosmology assumes that the primordial perturbation field is Gaussian, 
and therefore on large scales where the fluctuations are still small the perturbations field will 
be very close to Gaussian, even in the present epoch. The statistical properties of any Gaussian 
random field depend only on its two-point covariance matrix S [1]. If the noise is also a Gaussian 
field, then, since it is independent, one can construct simple estimates of the underlying field 
s, based on the knowledge of the full probability distribution function. 

In our case, if both the signal and the noise are Gaussian, the conditional mean value of the 
field given the data can also serve as an estimator of s, / sP(s|d) ds, which yields s mean = s MV . 
s MV coincides also with the maximum a posteriori estimate (MAP) of the field; namely the one 
that maximizes the conditional probability P(s|d). Therefore for Gaussian fields WF coincides 
with the mean and the MAP estimators, i.e., s MV = s mean = s MAP . The residual from the 
mean coincides with r, and it has Gaussian distribution with zero mean and covariance matrix 
(S -1 + R^N _1 R) 1 . This property is the basis of the constrained realization method [6]. 
Another estimator can be formulated from the point of view of Bayesian statistics. The main 
objective of this approach is to calculate the posterior probability of the model given the data, 
which is written according to Bayes' theorem as P(model|data) oc P(data|model)P(model). 
The estimator of the underlying field (i.e., model, in Bayes' language) is taken to be the one 
that maximizes P(model|data), which is the most probable field. One can show that in the 
case in which the prior assumed to be a Gaussian, the Bayesian estimator coincides with the 
s MV ([20], [8], [17]). 

3 Spherical Harmonics Expansion and SVD 

Spherical harmonics (SH) have been used to probe the large scale structure from wide angle 
galaxy surveys ([11], [12], [15], [16]). These analyses consist of expanding the angular galaxy dis- 
tribution in a set of spherical harmonics which form an orthonormal functional basis when the 
expansion is carried out over the full, 47r, sky. The expansion coefficients in SH basis constitute 
a statistically orthogonal set (like the case of k-space), which makes it more comfortable to deal 
with the data in the case of full sky coverage. 

Consider an underlying angular density field, given in terms of its spherical harmonic expansion, 
a im = / dr <5(r) y ; * m (r), where the projected surface density is given by S(r) = J2im a imYi m (i)- 
This field is sampled by a finite discrete distribution of galaxies, which suffers basically from 
incomplete sky coverage and shot-noise. The observed harmonics c; m are related to the under- 
lying whole sky harmonics a; m by: c; m = J2i' m ' Wff}" 1 {a;/ m / + cr/w} where the monopole term 
(/' = 0) is excluded [15]. <r; m is the shot-noise in the 'true' number- weighted harmonics a; m 's. 
The noise variance is estimated as ^c 2 ^) = A/" (the mean number of galaxies per steradian, in- 
dependent of fin this case ). Notice that W$ m , the harmonic transform of the mask, introduce 
'cross-talk' between different harmonics. 

Applying equation 3 to this case, it can be shown that the solution for this inversion is ([9], 
[20]) 

a MV = d iag { J ^}w-\ ee B-c, (4) 

where the vectors a and c represent the sets {a; m } and {c; m }. Here A\ is the cosmic variance 
in the harmonics, which is determined by the assumed power spectrum (i.e., the prior ). 
Here, this method is applied to a standard CDM simulation characterized by fi ^ = 0.5 with 
particles selected to mimic the IRAS 1.2 Jy galaxy catalog with ZOA |6| = 15° (see [9] for the 
case of the true IRAS 1.2 Jy data with ZOA |6| = 5°). The simulation evolves the particles until 
the rms variance of the density field in a sphere of 8Mpc/h reache <r 8 = 0.62 (see [5] for more 
details). Figure 1 shows the harmonic reconstruction of the projected counts of the simulation 



using the raw harmonic coefficients, c; m , up to l max = 15. In this case the direct inversion of 
the matrix B is unstable and yields excessive power on small scales. However using the SVD 
algorithm to invert the matrix B sheds a new light on the question of the amount of useful 
information contained in the data. 




Figure 1: Harmonics reconstruction of the projected counts of the mock IRAS 1.2 Jy catalog with a 'Zone 
of Avoidance' |6| = 15°, in Galactic Aitoff projection, using the observed coefficients c; ra , up to l max = 15. 
The contour levels of the projected surface number density are in steps of 100 galaxies per steradian (the mean 
projected density is M ~ 400 galaxies per steradian). 

Figure 2: The sorted spectrum of the singular values of the matrix B, for the |6| = 15° case. The 'knee' 
suggests how to choose the cutoff for the SVD, \ m in/^max = 0.565, at the mode 184 (l ma x ~ 13)- 

Essentially, the SVD algorithm decompose the square matrix B into the product of three 
matrices B = U diag{Xi} U T (SVD can also be applied to non-square matrices [13]). The A,'s 
and the rows of the matrix U, are the eigen- values and eigen- vectors of the matrix B. After 
performing the decomposition, the inversion of B is trivial, 

B 1 = U T diag{l/Xi} U. (5) 

Formally speaking, equation 4 has a unique solution if and only if B is a non-singular matrix, 
namely if A, 7^ for all i. However a meaningful solution to equation 4 can be obtained 
even in the case where B is singular, by requiring the solution to minimize the norm of the 
residuals, |Ba — c|. Such a solution is obtained by substituting 1/A,- = in the expression for 
the inverse (equation 4) for any A, = ([13]). Realistically, 1/A,- is set to zero for any A, less 
than some lower limit, determined by the physical and geometrical properties of the system. 
The question of how, in general, one should set the lower limit is discussed in §4 . In general, 
the eigenvalues measure the amount of 'information' carried by each mode in the problem; 
the small eigenvalues are those that might destabilize the inversion, so they do notcontribute 
significantly to the reconstruction. 

Figure 2 shows the sorted spectrum of the eigenvalues (A,) of B, versus the harmonic number, 
I. The 'knee' in this figure suggests that the cutoff for the SVD, should be at A m ,„ = 0.565\ max . 
Note that this cutoff suggest that only the largest 184 modes are significant (/ 13). Figure 3 
shows the reconstructed a; m 's map using WF and the above cutoff for SVD. Note that here, 
contrary to the |6| =5° case discussed by Lahav et al. [9] where the structure was recovered in 
the ZOA, the reconstructed ZOA remains empty. This illustrates that WF, even with the use 
of SVD extra-regularization, can not create structures out of nothing, unless the structures are 
dictated by the correlations ([9]). 
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Figure 3: The reconstructed a; ra 's using Wiener filter and the cutoff shown in figure 2 

Figure 4: The correlation function for SCDM model with 8Mpc/h smoothing, as measured in the redshift 
space according to the exact linear calculation around point at 500Mpc/h distance. Z is the l.o.s. direction. 

4 'Optimal' Functional Basis 

Frequently, it is useful to expand the cosmological perturbation field as a linear combination 
of a complete orthogonal functional basis. Such a basis defines a space in which all of the 
physical, statistical and geometrical properties of the field are retained. In the framework 
of linear gravitational instability, spaces of a special interest are Fourier space ([12]) and the 
space defined by SH and spherical Bessel functions ([4], [20]). These spaces are commonly 
used for the following reasons: a) Their expansion coefficients are statistically orthogonal (e.g., 
(^k <*>]£'} = P(k)5i)(\i — k')). b) They are eigen-functions of the Laplacian operator (and the V 
operator, in case of Fourier space). 

However, the problems from which galaxy catalogs suffer complicate the picture since they 
introduce cross-talk between different modes, both in the signal covariance matrix itself and 
in the noise correlation matrix, as shown in §3 just above equation 4. For instance, in the 
linear regime, the redshift distortion, due to its anisotropy 'squeezes' the configuration space 
correlation function along the line of sight. Figure 4 shows the correlation matrix in redshift 
configuration space, around R = 500Mpc/ h, for SCDM model in the framework of linear theory 
([7], [19]). In k-space this is reflected in a cross-talk between different Fourier modes [19]. 
A different approach to the question of the functional basis, is to construct a statistically 
orthogonal functional basis according to our prior knowledge of the survey geometry, clustering 
properties and weighting scheme. 

One possibility, which is useful for Wiener reconstruction, is to find the eigen-functions and 
eigen-values of the matrix F defined by equation 3. Indeed this is the case presented in §3, 
where the expansion was used to stabilize the inversion by means of data compression (i.e., 
excluding small eigen-values). However, that was is a special case, in the sense that it had a 
characteristic scale (defined by the mask width) that corresponds to the position of the 'knee' 
in figure 2. Generally speaking, diagonalizing F does not guarantee such a 'natural' choice for 
the cutoff. Moreover, this expansion is not, necessarily, useful for other cases, especially for 
likelihood analysis. 

Another possibility, is to find the eigen-functions of the matrix <^dd^ (signal + noise) [10], 
which appears both in WF and in the Gaussian likelihood probability. However, the main 
problem of this approach, is that one can't distinguish between the contribution of the signal 
and the contribution of the noise to the eigen-values. Therefore it can not be used for data 
compression, nor for stabilizing the inversion, without throwing away useful information. 
Here we suggest an 'improved SVD' approach, which basically solves the following generalized 
eigen-value problem, 

RSR f a,- = A,-Na,- (6) 



where a, and A, are the eigen- vector and its eigen- value respectively. This equation diagonalizes 
N and RSR^ simultaneously 1 . Note that the orthogonality condition of the vectors a is N 
weighted [18] (see also [2]). 

This approach provides a natural way to compress the data, since here the eigen-values repre- 
sent the relative contribution of the signal with respect to the noise in each mode. Hence, if 
A,- > 1, the mode is dominated by the signal, and it is dominated by the noise otherwise. Note 
also that the eigen- vectors constitute a functional basis, although not complete, which includes 
all the prior knowledge about the data. 

To summarize, this 'optimal' functional basis together with the WF formalism, the constrained 
realization method, and the Bayesian likelihood analysis (parameter estimation) provide a com- 
prehensive and self contained prescription for analyzing the (linear) LSS of the Universe. 
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1 This is similar to the problem of small oscillations in classical mechanics, where the noise and the signal 
matrices are analogous to the kinetic energy and potential energy matrices. 



